epidemics

Animal diseases in Switzerland

FOAG, Federal Office for Agriculture, collects data on the animal diseases in Switzerland. This data is published as Linked Data.

In this tutorial, we will show how to work with Linked Data. Mainly, we will see how to work with data on animal epidemics.
We will look into how to query, process, and visualize it.

1. Setup
      1.1 Python
      1.2 SPARQL endpoints
      1.3 SPARQL client

2. Data
      2.1 Animal species
      2.2 Diseases
      2.3 Can we link disease to animal type?
      2.4 Reports
      2.5 Example: goats

3. Analysis
      3.1 Cattle: Mucosal Disease
      3.2 Bees: Sauerbrut
      3.3 Rabies
      3.4 Biggest epidemics
      3.5 Epidemics over time

Setup

Python

This notebook requires Python 3.9 or later to run.

SPARQL endpoints

For epidemiological data

Reports on animal diseases are published as Linked Data. It can be accessed with SPARQL queries.
You can send queries using HTTP requests. The API endpoint is https://lindas.admin.ch/query/.

For geodata

Animal diseases are closely linked to places. To understand their location, we will work with Swiss geodata. It is published as Linked Data. It can be accessed using API endpoint under https://ld.geo.admin.ch/query.

SPARQL client

Let's use SparqlClient from graphly to communicate with both databases. Graphly will allow us to:

  • send SPARQL queries
  • automatically add prefixes to all queries
  • format response to pandas or geopandas
In [1]:
import datetime
import itertools
import json

import folium
import mapclassify
import matplotlib as mpl
import matplotlib.cm
import networkx as nx
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from bokeh.io import output_notebook, show
from bokeh.models import (Circle, CustomJS, HoverTool, LabelSet, MultiLine,
                          NodesAndLinkedEdges, Plot, TapTool)
from bokeh.palettes import Paired
from bokeh.plotting import ColumnDataSource, from_networkx
from dateutil.relativedelta import relativedelta
from folium.plugins import TimeSliderChoropleth
from graphly.api_client import SparqlClient
from plotly.subplots import make_subplots
from enum import Enum
from collections import namedtuple
In [2]:
# Uncomment to install dependencies in Colab environment
#!pip install mapclassify
#!pip install git+https://github.com/zazuko/graphly.git
In [3]:
sparql = SparqlClient("https://int.lindas.admin.ch/query")
geosparql = SparqlClient("https://ld.geo.admin.ch/query")

sparql.add_prefixes({
    "schema": "<http://schema.org/>",
    "cube": "<https://cube.link/>",
    "admin": "<https://schema.ld.admin.ch/>",
    "skos": "<http://www.w3.org/2004/02/skos/core#>",
    "disease": "<https://environment.ld.admin.ch/foen/animal-pest/>"   
})

geosparql.add_prefixes({
    "dct": "<http://purl.org/dc/terms/>",
    "geonames": "<http://www.geonames.org/ontology#>",
    "schema": "<http://schema.org/>",
    "geosparql": "<http://www.opengis.net/ont/geosparql#>",
})

SPARQL queries can become very long. To improve the readibility, we will work wih prefixes.

Using the add_prefixes method, we can define persistent prefixes. Every time you send a query, graphly will now automatically add the prefixes for you.

Data

Our goal is to understand outbreaks of animal diseases. First, let's take a look at the animals found in our dataset.

Animal species

In [4]:
query = """
SELECT DISTINCT ?specie ?group
WHERE {
  ?s disease:animal-specie ?specieIRI.
  ?specieIRI schema:name ?specie.
  
  ?specieIRI skos:broader/schema:name ?group.
  
  FILTER (LANG(?specie) = "de")
  FILTER (LANG(?group) = "de")
} 
ORDER BY ?group
"""

df = sparql.send_query(query)
df[0:15]
Out[4]:
specie group
0 Anderes Haustier Andere Haustiere
1 Bienen Bienen
2 Pferd Equiden
3 Fisch Fische
4 Krebs Fische
5 Hausgans Gefluegel
6 Huhn Gefluegel
7 Wachtel Gefluegel
8 Hund Hund
9 Kaninchen Kaninchen
10 Katze Katzen
11 Schildkröte Reptilien
12 Echse Reptilien
13 Schlange Reptilien
14 Rind Rinder

Diseases

Now, let's take a look at the diseases the animals can suffer from.

In [5]:
query = """
SELECT DISTINCT ?epidemics ?group
WHERE {
  ?s disease:epidemics ?epidemicsIRI.
  ?epidemicsIRI schema:name ?epidemics.
  
  ?epidemicsIRI skos:broader/schema:name ?group.

  FILTER (LANG(?epidemics) = "de")
  FILTER (LANG(?group) = "de")
}
ORDER BY ?group
"""

df = sparql.send_query(query)
df.head()
Out[5]:
epidemics group
0 Bovine Virusdiarrhoe / Mucosal Disease Auszurottende Seuchen
1 Brucellose der Rinder Auszurottende Seuchen
2 Tollwut Auszurottende Seuchen
3 Infektiöse Agalaktie Auszurottende Seuchen
4 Bovine spongiforme Enzephalopathie Auszurottende Seuchen

Thus far, we have seen all animals, and all diseases. However, not all animals will catch all diseases.
What diseases are swiss animals exposed to?

In [6]:
query = """
SELECT DISTINCT ?epidemics ?specie ?group
WHERE {
  <https://environment.ld.admin.ch/foen/animal-pest/observation/> cube:observation ?obs .
  ?obs disease:epidemics/schema:name ?epidemics;
       disease:animal-specie ?specieIRI.
       
  ?specieIRI schema:name ?specie.
  ?specieIRI skos:broader/schema:name ?group.
  
  FILTER (LANG(?epidemics) = "de")
  FILTER (LANG(?specie) = "de")
  FILTER (LANG(?group) = "de")
} 

ORDER BY ?specie
"""
df = sparql.send_query(query)
df = df[df.group != "Wildtier"]
In [7]:
# Graph of epidemics-specie relations
graph = nx.from_pandas_edgelist(df, source='epidemics', target='specie')
groups = {key: "epidemics" for key in df.epidemics} | {key: "specie" for key in df.specie}

nx.set_node_attributes(graph, groups, name="group")

colormaps = {"fill_color": 
                {"epidemics": Paired[4][0], "specie": Paired[4][2]}, 
            "hover_color": 
                {"epidemics": Paired[4][1], "specie": Paired[4][3]}
            }

for name, colormap in colormaps.items():
    colors = {k: colormap[v] for k, v in groups.items()}
    nx.set_node_attributes(graph, colors, name=name)
In [8]:
# Interactive graph plot
plot = Plot(plot_width = 1000, plot_height=1000)
plot.title.text = 'Swiss animals and their diseases'

graph_renderer = from_networkx(graph, nx.drawing.layout.bipartite_layout, nodes = df.specie.unique())

# manipulating nodes
nonselect_node = Circle(size = 15, fill_color = "fill_color")
select_node = Circle(size = 15, fill_color = "hover_color")
graph_renderer.node_renderer.glyph = nonselect_node
graph_renderer.node_renderer.nonselection_glyph = nonselect_node
graph_renderer.node_renderer.selection_glyph = select_node
graph_renderer.node_renderer.hover_glyph = select_node
graph_renderer.node_renderer.data_source.data['group'] = [groups[g] for g in graph_renderer.node_renderer.data_source.data["index"]]

# manipulating edges
select_edge = MultiLine(line_color = Paired[7][-1], line_width = 2)
graph_renderer.edge_renderer.glyph = MultiLine(line_color = '#CCCCCC', line_alpha = .5, line_width = 1)
graph_renderer.edge_renderer.selection_glyph = select_edge
graph_renderer.edge_renderer.hover_glyph = select_edge


graph_renderer.selection_policy = NodesAndLinkedEdges()
graph_renderer.inspection_policy = NodesAndLinkedEdges()

plot.renderers.append(graph_renderer)

callback_code = '''
    if (cb_data.index.indices.length > 0) {
        const index = cb_data.index.indices[0];

        if (source.data.group[index] === "specie") {
            hover.tooltips = [["Animal", "@index"]];  
        }
        else {
            hover.tooltips = [["Disease", "@index"]];
        }                                     
    }
'''

hover = HoverTool(
    renderers=[graph_renderer.node_renderer]
)
hover.callback = CustomJS(
    args = dict(source = graph_renderer.node_renderer.data_source, hover = hover),
    code = callback_code
)

pos = {k: v for k, v in graph_renderer.layout_provider.graph_layout.items() if graph.nodes[k]["group"] == "specie"}
names = list(pos.keys())
x, y = map(list, zip(*pos.values()))

source = ColumnDataSource({'x': x, 'y': y, 'name': names})
labels = LabelSet(x_offset=15, y_offset=-6, x='x', y='y', text='name', source=source, background_fill_color='white', text_font_size='10px', background_fill_alpha=.7)
plot.renderers.append(labels)

plot.add_tools(hover, TapTool())

output_notebook()
show(plot)
Loading BokehJS ...

Reports

When animals get sick, the farmer will typically consult a vet. After diagnosis, the incident is reported to the Federal Office for Agriculture. This data is publicly available and can be found in our endpoint:

In [9]:
#Run in browser: https://s.zazuko.com/24fSKE
query = """
SELECT ?diagnosis ?commune ?specie ?stock ?sick ?infected ?killed ?deceased ?epidemics
WHERE {
  <https://environment.ld.admin.ch/foen/animal-pest/observation/> cube:observation ?obs .
  ?obs disease:epidemics/schema:name ?epidemics;
       disease:diagnosis-date ?diagnosis;
       disease:animals-stock ?stock;
       disease:animals-sick ?sick;
       disease:animals-infected ?infected;
       disease:animals-killed ?killed;
       disease:animals-deceased ?deceased;
       disease:internet-publication ?date;
       disease:animal-specie/schema:name ?specie;
       schema:containedInPlace/schema:name ?commune .
  
  FILTER (LANG(?epidemics) = "de")
  FILTER (LANG(?specie) = "de")
} 
ORDER BY DESC(?diagnosis) ?commune
"""
df = sparql.send_query(query)
df.head()
Out[9]:
diagnosis commune specie stock sick infected killed deceased epidemics
0 2021-11-23 Rudolfstetten-Friedlisberg Hund 1 1 0 0 0 Yersiniose
1 2021-11-23 Rüderswil Ziege 1 1 0 1 0 Pseudotuberkulose der Schafe und Ziegen
2 2021-11-22 Corsier-sur-Vevey Katze 1 1 1 0 0 Campylobacteriose
3 2021-11-22 Sâles Rind 1 1 0 0 0 Coxiellose
4 2021-11-21 Hüntwangen Huhn 30 22 0 8 22 Aviäre Influenza, AI

Aggregated reports

The reports give us many details. It tells us not only what specie was affected, but also how many animals got sick, infected, or deceased.

These reports, however, have one limitation. They are sent only once per incident. If a goat gets sick, the vet will examine the entire herd. S/he will check how many goats are sick, or infected. S/he may recommend killing parts or the entire herd. The farmer, or the vet, will then report how many aniamls were sick, infected, killed, or deceased.

But that will happen only once. Nobody will examine this herd again, be it the same week or later. Hence we do not know how a given disease evolved. A seemingly insignificant report may turn out to be a serious issue a few days later. We also do not know at which point in time the report was made. Was it a day, two or ten after the first infection?

The data from the reports raises many questions. To address them, we will work with number of reports. Instead of looking at how individual farms were affected, we will simply count how many farms reported health issues.

Our core data becomes:

In [10]:
query = """
SELECT ?diagnosis ?municipality_id ?municipality ?specie ?epidemics
WHERE {
  <https://environment.ld.admin.ch/foen/animal-pest/observation/> cube:observation ?obs .
  ?obs disease:epidemics/schema:name ?epidemics;
       disease:diagnosis-date ?diagnosis;
       disease:internet-publication ?date;
       disease:animal-specie/schema:name ?specie;
       schema:containedInPlace ?municipality_iri.
       
  ?municipality_iri schema:name ?municipality ;
                    schema:identifier ?municipality_id.
  
  FILTER (LANG(?epidemics) = "de")
  FILTER (LANG(?specie) = "de")
} 
ORDER BY DESC(?diagnosis) ?municipality
"""
df = sparql.send_query(query)
In [11]:
def aggregate_reports(reports: pd.DataFrame, weekly: bool) -> pd.DataFrame:
    df = reports.copy()
    if weekly:
        df["diagnosis"] = df.diagnosis.apply(lambda x: datetime.date(x.year, 1, 1) + relativedelta(weeks=+x.isocalendar()[1]))
    else:
        df['diagnosis'] = df.diagnosis.apply(lambda x: datetime.date(x.year, x.month, 1))

    return df[["diagnosis", "specie", "epidemics", "municipality_id"]].groupby(by=["diagnosis", "specie", "epidemics", "municipality_id"]).size().reset_index(name="reports")
In [12]:
# Aggregate results by week/month
df_monthly = aggregate_reports(df, False)
df_monthly = df_monthly[["diagnosis", "specie", "epidemics"]].groupby(by=["diagnosis", "specie", "epidemics"]).size().reset_index(name="reports")
df_monthly = df_monthly.sort_values(by=["diagnosis", "epidemics", "specie"], ascending=False).reset_index(drop=True)
df_monthly.head()
Out[12]:
diagnosis specie epidemics reports
0 2021-11-01 Hund Yersiniose 1
1 2021-11-01 Kaninchen Virale hämorrhagische Krankheit der Kaninchen 1
2 2021-11-01 Affe Tularämie 1
3 2021-11-01 Wolf Trichinellose 1
4 2021-11-01 Fuchs Toxoplasmose 1

Example: goats

We have seen that one animal can suffer from many diseases. Remember the animal-disease graph? A goat can get sick of twenty different diseases!

Goat diseases

drawing

Some of them are more serious than others. Which ones were the most often reported in the past?

In [13]:
DimensionData = namedtuple('Dimension', ['slice_of', "category"])

class Dimension(Enum):

    Epidemic = DimensionData("epidemics", "specie")
    Specie = DimensionData("specie", "epidemics")

def plot_multiple_categories(df: pd.DataFrame, dim: Dimension, value: str, min_reports: int=15, min_reports_at_time:int = 5) -> None:

    category = dim.value.category
    df_slice = df[df[dim.value.slice_of] == value]
    counts = df_slice[[category, "reports"]].groupby(category).agg(["sum", "max"]).reset_index()
    categories_filtered = list(counts[(counts["reports"]["sum"] >= min_reports) & (counts["reports"]["max"] >= min_reports_at_time)][category])
    df_slice = df_slice[df_slice[category].isin(categories_filtered)]

    all_dates = [x.date() for x in pd.date_range(df_slice.diagnosis.min(), df_slice.diagnosis.max(), freq='MS')]
    combinations = list(itertools.product(*[all_dates, df_slice[category].unique()]))
    values_empty = pd.DataFrame(combinations, columns =['diagnosis', category])
    plot_df = pd.merge(values_empty, df_slice, how="left", on=["diagnosis", category]).fillna(0)

    fig = go.Figure()
    for cat in plot_df[category].unique():
        plot_df_slice = plot_df[plot_df[category] == cat]
        labels = ["<b>{}</b><br>Date: {}<br>Reports: {}".format(cat, row.diagnosis, int(row.reports)) for row in plot_df_slice[["diagnosis", "reports"]].itertuples()]
        fig.add_trace(go.Scatter(x=plot_df_slice.diagnosis, y=plot_df_slice.reports, name=cat,
                            text=labels,
                            hoverinfo='text',
                            line_shape='spline'))

    fig.update_xaxes(rangeslider_visible=True, range=[min(plot_df.diagnosis), max(plot_df.diagnosis)])
    fig.update_yaxes(title="reports", range = [0, int(max(plot_df.reports)*1.1)+1])
    fig.update_layout(
        title={"text": "Epidemics: {}".format(value), "x": 0.5}
    )
    fig.show()
In [14]:
plot_multiple_categories(df_monthly, Dimension.Specie, "Ziege", 15, 5)

Analysis

Some diseases get more serious than others. What were the most reported diseases in the last years?

In [15]:
diseases = df_monthly[["reports", "epidemics"]].groupby("epidemics").sum().reset_index().sort_values(by="reports", ascending=False).reset_index(drop=True)
diseases[:15]
Out[15]:
epidemics reports
0 Bovine Virusdiarrhoe / Mucosal Disease 7221
1 Sauerbrut der Bienen 6130
2 Salmonellose 2449
3 Coxiellose 2200
4 Faulbrut der Bienen 1915
5 Caprine Arthritis-Encephalitis 1599
6 Chlamydienabort der Schafe und Ziegen 1254
7 Campylobacteriose 1211
8 Kryptosporidiose 874
9 Neosporose 774
10 Virale hämorrhagische Krankheit der Kaninchen 680
11 Paratuberkulose 620
12 Tollwut 538
13 Enzootische Pneumonie der Schweine 532
14 Varroa jacobsoni (Milbenkrankheit der Bienen) 519

Cattle: Mucosal Disease

Bovine Virusdiarrhoe / Mucosal Disease seems to be the most serious outbreak. Let's take a loot at the number of reports over time.

We can see that Bovine Virusdiarrhoe has a serious outbreak in 2008-2009. The number of reports remained high for many months after the outbreak. While reports are less frequent now, Bovine Virusdiarrhoe remains present among swiss cows.

In [16]:
disease = "Bovine Virusdiarrhoe / Mucosal Disease"
plot_df = df_monthly[df_monthly.epidemics == disease]
fig = px.bar(plot_df, x="diagnosis", y="reports")
fig.update_layout(
    title={"text": disease, "x": 0.5}
)
fig.show()

Bees: Sauerbrut

Sauerbrut der Bienen is the second most reported animal disease. Taking a look at the frequency of reports, explains why.

Sauerbrut is a seasonal disease. One can observe it first in early spring. Numbers peak around May, and tend to go down afterwards. While we cannot speak about one, big outbreak, sauerbrut is defnitely an issue for apiculturists.

In [17]:
disease = "Sauerbrut der Bienen"
plot_df = df_monthly[df_monthly.epidemics == disease]
fig = px.bar(plot_df, x="diagnosis", y="reports")
fig.update_xaxes(rangeslider_visible=True)
fig.update_layout(
    title={"text": disease, "x": 0.5}
)
fig.show()

Bees: diseases

What about other bee diseases? Are they also seasonal?

In [18]:
plot_multiple_categories(df_monthly, Dimension.Specie, "Bienen", 15, 2)

While Sauerbrut is clearly the biggest issue, it is not the only seasonal bees' disease . Faulbrut also shows seasonality pattern, with peak reports in springtime.

Rabies

Now, let's take a look at rabies.

A histogram tells a clear story: rabies is barely reported any more! Rabies was a serious issue until ~1995. But nowadays, almost no cases of rabies in Switzerland are reported.

In [19]:
disease = "Tollwut"
plot_df = df_monthly[df_monthly.epidemics == disease]
fig = px.bar(plot_df, x="diagnosis", y="reports")
fig.update_layout(
    title={"text": disease, "x": 0.5}
)
fig.show()

Rabies: animals

What animals were main carriers of rabis?

In [20]:
plot_multiple_categories(df_monthly, Dimension.Epidemic, "Tollwut", 15, 2)

The graph shows how effective the oral vaccination of foxes for rabies has been in the middle of the nineties.

Biggest epidemics

Now, let's take a look at the most serious of the last years. What are the patterns? How many epidemics did we have? Are there any trends, or seasonalities?

In [21]:
ROWS=4
COLS=2

biggest_epidemics = diseases[0:ROWS*COLS]
fig = make_subplots(rows=ROWS, cols=COLS, subplot_titles=biggest_epidemics["epidemics"])
for obs in biggest_epidemics.itertuples():
    
    row = obs.Index//COLS + 1
    col = obs.Index%COLS + 1
    plot_df = df_monthly[df_monthly.epidemics == obs.epidemics]
    fig.append_trace(go.Scatter(x=plot_df["diagnosis"], y=plot_df["reports"], name=obs.epidemics, marker_color=px.colors.qualitative.Dark24[0], line_shape="spline"), row=row, col=col)
    
fig.update_layout(height=1200, width=1000, title={"text": "Biggest animal epidemics in Switzerland", "x": 0.5}, showlegend=False)
fig.show()

Epidemics over time

Bovine Virusdiarrhoe is clearly the biggest outbreak of the past years. Hundreds of farms were affected every month, leaving thousands of cows sick.

But how did that outbreak came to be? Where did it start, and how did it spread?

In [22]:
query = """
SELECT ?municipality_id ?municipality ?boundary 

WHERE {
  ?muni_iri dct:hasVersion ?version ;
            geonames:featureCode geonames:A.ADM3 .
  
  ?version schema:validUntil "2020-12-31"^^<http://www.w3.org/2001/XMLSchema#date>;
           schema:name ?municipality;
           geosparql:hasGeometry/geosparql:asWKT ?boundary.
  
  BIND(IRI(REPLACE(STR(?muni_iri), "https://ld.geo.admin.ch/boundaries/", "https://ld.admin.ch/")) AS ?municipality_id)
}
"""
communes = geosparql.send_query(query)
communes = communes.set_crs(epsg=4326)
communes["municipality_id"] = communes.municipality_id.str.split("/").str[-1].astype(int)
communes = communes.set_index("municipality_id")
In [23]:
DISEASE = "Bovine Virusdiarrhoe / Mucosal Disease"
SPECIE = "Rind"
START = datetime.date(2008, 3, 1)
END = datetime.date(2009, 7, 1)
WINDOW = 6

df_weekly = aggregate_reports(df, True)
subset = df_weekly[(df_weekly.specie == SPECIE) & (df_weekly.epidemics == DISEASE) & (df_weekly.diagnosis >= START) & (df_weekly.diagnosis <= END)].copy()
rolling_reports = subset[["diagnosis", "municipality_id", "reports"]].pivot(index="diagnosis", columns="municipality_id", values="reports").sort_index().fillna(0)
rolling_reports = rolling_reports.rolling(WINDOW).sum().iloc[WINDOW-1:,:]

datetime_index = pd.DatetimeIndex(rolling_reports.index)
dt_index_epochs = datetime_index.view("int") // 10**9
rolling_reports.index = dt_index_epochs.astype('U10')
In [24]:
bins = [1,2,5,8,25]
classifier = mapclassify.UserDefined(y=rolling_reports.values[rolling_reports.values > 0], bins=bins)
categories = range(len(bins))
colormap = {cat: mpl.colors.rgb2hex(mpl.cm.PuRd(i/(len(categories)-1))) for i, cat in enumerate(categories)}

styledict = {}
timestamps = rolling_reports.index
for commune in communes.index:
    styledict[commune] = {str(t): {"color": colormap[0], "opacity": 1} for t in timestamps}
    
    if commune in rolling_reports.columns:
        for timestamp in timestamps:
            reports = rolling_reports.loc[timestamp, commune]

            reports_classified = classifier(reports)[0]
            styledict[commune][str(timestamp)] = {"color": colormap[reports_classified], "opacity": 1}


communes["dummy"] = 0
m = folium.Map(location=[46.83, 8.13], zoom_start=8, tiles="cartodbpositron")

g = TimeSliderChoropleth(
    communes.to_json(),
    styledict=styledict
).add_to(m)

folium.Choropleth(
    geo_data=json.loads(communes.to_json()),
    data=communes,
    columns=['municipality', 'dummy'],
    key_on='feature.id',
    fill_color= 'PuRd',
    fill_opacity=0.0,
    line_opacity=0.0,
    bins=[0]+bins,
    legend_name="Cases of {}".format(DISEASE)
).add_to(m)

folium.LayerControl().add_to(m)
m
Out[24]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Thank you for following along and hopefully this notebook was helpful. If you want to get in touch with us, please reach out to us via email.